89 research outputs found

    A User-Friendly Hybrid Sparse Matrix Class in C++

    Get PDF
    When implementing functionality which requires sparse matrices, there are numerous storage formats to choose from, each with advantages and disadvantages. To achieve good performance, several formats may need to be used in one program, requiring explicit selection and conversion between the formats. This can be both tedious and error-prone, especially for non-expert users. Motivated by this issue, we present a user-friendly sparse matrix class for the C++ language, with a high-level application programming interface deliberately similar to the widely used MATLAB language. The class internally uses two main approaches to achieve efficient execution: (i) a hybrid storage framework, which automatically and seamlessly switches between three underlying storage formats (compressed sparse column, coordinate list, Red-Black tree) depending on which format is best suited for specific operations, and (ii) template-based meta-programming to automatically detect and optimise execution of common expression patterns. To facilitate relatively quick conversion of research code into production environments, the class and its associated functions provide a suite of essential sparse linear algebra functionality (eg., arithmetic operations, submatrix manipulation) as well as high-level functions for sparse eigendecompositions and linear equation solvers. The latter are achieved by providing easy-to-use abstractions of the low-level ARPACK and SuperLU libraries. The source code is open and provided under the permissive Apache 2.0 license, allowing unencumbered use in commercial products

    Bayesian spatial extreme value analysis of maximum temperatures in County Dublin, Ireland

    Get PDF
    In this study, we begin a comprehensive characterisation of temperature extremes in Ireland for the period 1981-2010. We produce return levels of anomalies of daily maximum temperature extremes for an area over Ireland, for the 30-year period 1981-2010. We employ extreme value theory (EVT) to model the data using the generalised Pareto distribution (GPD) as part of a three-level Bayesian hierarchical model. We use predictive processes in order to solve the computationally difficult problem of modelling data over a very dense spatial field. To our knowledge, this is the first study to combine predictive processes and EVT in this manner. The model is fit using Markov chain Monte Carlo (MCMC) algorithms. Posterior parameter estimates and return level surfaces are produced, in addition to specific site analysis at synoptic stations, including Casement Aerodrome and Dublin Airport. Observational data from the period 2011-2018 is included in this site analysis to determine if there is evidence of a change in the observed extremes. An increase in the frequency of extreme anomalies, but not the severity, is observed for this period. We found that the frequency of observed extreme anomalies from 2011-2018 at the Casement Aerodrome and Phoenix Park synoptic stations exceed the upper bounds of the credible intervals from the model by 20% and 7% respectively

    Assessing United States county-level exposure for research on tropical cyclones and human health

    Get PDF
    Includes bibliographical references (pages 067007-12-067007-13).Background: Tropical cyclone epidemiology can be advanced through exposure assessment methods that are comprehensive and consistent across space and time, as these facilitate multiyear, multistorm studies. Further, an understanding of patterns in and between exposure metrics that are based on specific hazards of the storm can help in designing tropical cyclone epidemiological research. Objectives: a) Provide an open-source data set for tropical cyclone exposure assessment for epidemiological research; and b) investigate patterns and agreement between county-level assessments of tropical cyclone exposure based on different storm hazards. Methods: We created an open-source data set with data at the county level on exposure to four tropical cyclone hazards: peak sustained wind, rainfall, flooding, and tornadoes. The data cover all eastern U.S. counties for all land-falling or near-land Atlantic basin storms, covering 1996–2011 for all metrics and up to 1988–2018 for specific metrics. We validated measurements against other data sources and investigated patterns and agreement among binary exposure classifications based on these metrics, as well as compared them to use of distance from the storm’s track, which has been used as a proxy for exposure in some epidemiological studies. Results: Our open-source data set was typically consistent with data from other sources, and we present and discuss areas of disagreement and other caveats. Over the study period and area, tropical cyclones typically brought different hazards to different counties. Therefore, when comparing exposure assessment between different hazard-specific metrics, agreement was usually low, as it also was when comparing exposure assessment based on a distance-based proxy measurement and any of the hazard-specific metrics. Discussion: Our results provide a multihazard data set that can be leveraged for epidemiological research on tropical cyclones, as well as insights that can inform the design and analysis for tropical cyclone epidemiological researc

    Improving gene-set enrichment analysis of RNA-Seq data with small replicates

    Get PDF
    Deregulated pathways identified from transcriptome data of two sample groups have played a key role in many genomic studies. Gene-set enrichment analysis (GSEA) has been commonly used for pathway or functional analysis of microarray data, and it is also being applied to RNA-seq data. However, most RNA-seq data so far have only small replicates. This enforces to apply the gene-permuting GSEA method (or preranked GSEA) which results in a great number of false positives due to the inter-gene correlation in each gene-set. We demonstrate that incorporating the absolute gene statistic in one-tailed GSEA considerably improves the false-positive control and the overall discriminatory ability of the gene-permuting GSEA methods for RNA-seq data. To test the performance, a simulation method to generate correlated read counts within a gene-set was newly developed, and a dozen of currently available RNA-seq enrichment analysis methods were compared, where the proposed methods outperformed others that do not account for the inter-gene correlation. Analysis of real RNA-seq data also supported the proposed methods in terms of false positive control, ranks of true positives and biological relevance. An efficient R package (AbsFilterG- SEA) coded with C++ (Rcpp) is available from CRAN.open
    corecore